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PARALLEL ARITHMETIC APPARATUS, ENTERTAINMENT APPARATUS, 
PROCESSING METHOD, COMPUTER PROGRAM AND 
SEMICONDUCTOR DEVICE 

5 CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is based upon and claims the benefit of priority 
from the prior Japanese Patent Applications No. 2000-335787, filed 
November 2, 2000, and No. 2001-318590 filed October 16, 2001, the 
entire contents of both of which are incorporated herein by reference. 
1 0 B ACKGRO UND OF THE INVENTION 

Field of the Invention 
The present invention relates to a technology for carrying out 
processing using a plurality of arithmetic units in parallel, for example, a 
parallel arithmetic processing technology for carrying out processing 
1 5 such as geometry processing which is executed on computer graphics at 
high speed. 

Description of the Related Art 
There are objects to be displayed with three-dimensional 
computer graphics which are modeled with a set of a plurality of basic 

20 graphics (polygons). The vertices of a polygon are expressed by 

four-dimensional coordinates (x, y, z, w) using homogeneous coordinates. 
The coordinates of the polygon vertices are subjected to coordinate 
transformation according to points of view coordinates and subjected to 
perspective transformation, etc. according to distances. That is, the 

25 coordinates of the polygon vertices are transformed in such a way that 
farther objects appear smaller. This series of processing is called 



"geometry processing". 

There are various modes of geometry processing. For example, a 
matrix operation using a 4 * 4 transformation matrix, etc. is performed 
on polygon rotation, expansion, contraction, perspective projection and 
translating or an inner product operation is carried out to determine 
brightness on a light-receptive surface, etc. These matrix operations 
and inner product operations require repetitions of sum-of-products 
operations. 

In three-dimensional computer graphics, a processing method 
using floating-points conventionally used for high end systems is now 
also used in the field of entertainment apparatuses for generating 
entertainment images such as video game images and the field with 
severe constraints on costs such as portable information terminals. 
This is because the processing method using floating-points broadens the 
data dynamic range and facilitates prograrnming, and is therefore suited 
to sophisticated processing. 

For the purpose of carrying out a matrix operation on 
floating-point numbers used for processing using floating-points, a 
parallel arithmetic apparatus is available which incorporates a plurality 
of floating-point sum-of-products operator (FMAC: Floating Multiply 
Accumulator) and carries out matrix operations efficiently. The ability 
of the parallel arithmetic apparatus to carry out operations in parallel 
using a plurality of FMACs increases the processing speed. 

There are apparatuses carrying out three-dimensional image 
processing such as an entertainment apparatus and personal computer 
that can obtain fine and real three-dimensional images at high speed by 



3 



carrying out aforementioned geometry processing using such a parallel 
arithmetic apparatus. 

If this parallel arithmetic apparatus is provided with four FMACs 
placed in parallel, the parallel arithmetic apparatus can easily perform 
5 matrix operations using a 4 x 4 transformation matrix as shown in 

mathematical expression 1 . However, it is difficult to perform an inner 
product operation between a vector A (Ax, Ay, Az, Aw) and vector B (Bx, 
By, Bz, Bw) shown in mathematical expression 2. 

This is because the coordinates X, Y, Z and W subject to 
1 0 processing are independently operated in a one-to-one correspondence 
with four FMACs. 

This will be explained more specifically. 

When a matrix operation in mathematical expression 1 is carried 
out, component values corresponding to one row of the transformation 

1 5 matrix and coordinate values of the coordinates to be transformed are fed 
into each of four FMACs. The component values of the transformation 
matrix and coordinate values of the coordinates entered are subjected to 
a sum-of-products operation to perform a matrix operation. For 
example, component values (Mil, M12, M13, M14) on the first row of the 

20 transformation matrix and coordinate values of the coordinates (Vx, Vy, 
Vz, Vw) are subjected to a sum-of-products operation to calculate 
"Mil- Vx+M12- Vy+M13- Vz+M14- Vw". Since each of the four FMACs 
carries out a similar sum-of-products operation, matrix operations are 
completed efficiently. In this Specification, " denotes a multiplication. 

25 When an inner product operation in mathematical expression 2 is 

carried out, each of the four FMACs is associated with one of the 
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component values of the components X, Y, Z and W. Therefore, Ax and 
Bx, Ay and By, Az and Bz and Aw and Bw are input to each of the four 
FMACs respectively. Ax- Bx, Ay- By, Az- Bz and Aw- Bw are calculated as 
their respective outputs. Thus, executing mathematical expression 2 
5 requires an adder for adding up the outputs of the four FMACs to be 
provided separately, which will increase the scale of the circuit. 

Thus, the conventional parallel arithmetic apparatus can process 
matrix operations efficiently, but the FMACs provided in parallel alone 
cannot perform vector inner product operations, and in this way the 
1 0 conventional parallel arithmetic apparatuses may require an additional 
adder. 

(MATHEMATICAL EXPRESSION 1) 

MmMuMuMuUV,] [Mn- V,+Mis ■ V,+Mis ■ V,+Mu ■ V. 
M21 M22 Nlza Mz* V y = Mii • V,+M 2 j • V y + M S3 • V,+Ma* ■ V« 
M31M32M33M3* V, M31 ■ Vx+Maz ■ V»+Maa ■ V,+M 3 4 • V w 

U4lM4lM4t Mul V, j IM4I ' V»+M42 ■ Vy+M43 ■ V.+M44 " V w 

(MATHEMATICAL EXPRESSION 2) 
15 (Ax, Ay, Az, Aw) ■ (Bx, By, Bz, Bw) 

= Ax- Bx+Ay- By+Az- Bz+Aw- Bw 
SUMMARY OF THE INVENTION 
It is a main object of the present invention to provide a parallel 
arithmetic apparatus capable of carrying out vector inner product 
20 operations easily while carrying out matrix operations as efficiently as the 
conventional parallel arithmetic apparatus. 

In order to solve the above- described problems, the parallel 
arithmetic apparatus according to the present invention comprises a 
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plurality of pairs of recording means for recording arithmetical elements 
to be operated and operating means for performing sum-of-products 
operations based on the arithmetical elements recorded in the recording 
means, wherein one of said recording means of all pairs is selected and 
5 selecting means for inputting the arithmetical elements recorded in the 
selected recording means to the operating means of the pair is inserted 
between the recording means and operating means of any one pair. 

The parallel arithmetic apparatus of the present invention can, 
when the selecting means selects recording means of the pair in which 

1 0 the selecting means itself is inserted, perform operations using 

arithmetical elements independent of each other in each pair. That is, it 
is possible to carry out matrix operations similar to the conventional art. 

On the other hand, when the selecting means selects one 
recording means after another from among all the recording means in a 

15 round-robin fashion, it is possible to perform operations using 

arithmetical elements recorded in the recording means of each pair. 
That is, the parallel arithmetic apparatus of the present invention can 
perform inner product operations easily without the need to use other 
circuits such as adders. 

20 This parallel arithmetic apparatus can also insert temporary 

recording means for temporarily recording the arithmetical elements 
recorded in the recording means of a pair in which the selecting means is 
not inserted is inserted between the recording means and operating 
means of the pair. In this case, the selecting means is constructed in 

25 such a way as to input the arithmetical elements recorded in the 

temporary recording means to the operating means when the recording 
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means of the pair in which the selecting means is not inserted is selected. 

Inserting the temporary recording means eliminates the need to 
occupy the output ports of the recording means when arithmetical 
elements are taken in from the recording means. This allows the 
recording means and operating means of the pair in which the temporary 
recording means is inserted to perform other processing. 

In the parallel arithmetic apparatus, the recording means of all 
pairs record, during a matrix operation, a first arithmetical element to be 
subjected to the matrix operation, and during a vector inner product 
operation, a second arithmetical element to be subjected to the vector 
inner product operation, the selecting means is constructed in such a 
way as to input the first arithmetical element from the recording means of 
the own pair to the operating means of the own pair, and during the inner 
product operation, in such a way as to select the recording means of all 
the pairs one by one in a round-robin fashion and input the second 
arithmetical element from the selected recording means to the operating 
means of the own pair. 

Each of the operating means performs operations with a content 
independently assigned to the pair using the operating elements recorded 
in the recording means of the pair and when this parallel arithmetic 
apparatus is used for three-dimensional computer graphics, such an 
operation is associated with any one of components of four-dimensional 
coordinates. 

Another embodiment of the present invention is a parallel 
arithmetic apparatus that selectively performs a matrix operation and 
vector inner product operation, comprising a plurality of recording means 
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for recording, during the matrix operation, a first arithmetical element to 
be subjected to the matrix operation and recording, during the inner 
product operation, a second arithmetical element to be subjected to the 
inner product operation, a plurality of operating means forming a 
5 one-to-one correspondence with the plurality of recording means for 

performing, during the matrix operation, a sum-of-products operation by 
each operating means inputting the first arithmetical element recorded in 
the corresponding recording means and performing, during the inner 
u= product operation, a sum-of-products operation by predetermined one of 

1 0 the operating means inputting the second arithmetical element recorded 

:tf in all the recording means and selecting means for selecting, during the 

y i 

matrix operation, the recording means corresponding to the 
uJ predetermined operating means and inputting a first arithmetical 

H element recorded in this recording means in the predetermined operating 

p 1 5 means, and selecting, during the inner product operation, the plurality of 
J» recording means one by one in a round-robin fashion and inputting a 

second arithmetical element recorded in the selected recording means in 
the predetermined operating means. 

In such a parallel arithmetic apparatus, the operating means is 
20 constructed so as to carry out a sum-of-products operation on the 

floating-point numbers when, for example, the arithmetical elements are 
expressed with floating-point numbers. 

The entertainment apparatus according to the present invention 
is an entertainment apparatus that performs image processing on an 
25 entertainment image by performing a matrix operation with regard to 

coordinates expressing a position and shape of an object and performing 
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an inner product operation with regard to vectors used to express an 
image of the object, comprising a plurality of registers that records, 
during the matrix operation, a first arithmetical element subjected to the 
matrix operation and records, during the inner product operation, a 
second arithmetical element subjected to the inner product operation, a 
plurality of sum-of-products operators forming a one-to-one 
correspondence with the plurality of registers that performs, during the 
matrix operation, a sum-of-products operation by each sum-of-products 
operator inputting the first arithmetical element recorded in the 
corresponding registers, and performs, during the inner product 
operation, a sum-of-products operation by predetermined one of the 
sum-of-products operators inputting the second arithmetical element 
recorded in all registers and a selector that selects, during the matrix 
operation, a register corresponding to the predetermined 
sum-of-products operator and inputs a first arithmetical element 
recorded in this register in the predetermined sum-of-products operator, 
and selects, during the inner product operation, the plurality of registers 
one by one in a round-robin fashion and inputs a second arithmetical 
element recorded in the selected register in the predetermined 
sum-of-products operator. 

Another embodiment of the present invention is an entertainment 
apparatus that performs image processing on an entertainment image by 
carrying out a matrix operation between a matrix and coordinate values 
to perform a coordinate transformation of coordinates expressing the 
position and shape of an object and carrying out an inner product 
operation between a normal vector oriented in the normal direction of the 
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surface of the object and position vector of a light source to determine the 
display mode of the surface of the object, comprising a plurality of 
registers that records the coordinate values and component values 
corresponding to any one row of the matrix during the matrix operation 
5 and records the normal vector and component values corresponding to 
any one component of the position vector during the inner product 
operation, a sum-of-products operators forming a one-to-one 
correspondence with the plurality of registers that carries out a 
sum-of-products operation during the matrix operation by each 

1 0 sum-of-products inputting the coordinate values recorded in the 

corresponding register and component values corresponding to the one 
row of the matrix, and carry out a sum-of-products operation during the 
inner product operation by predetermined one of the sum-of-products 
operators inputting the normal vector recorded in all registers and 

1 5 component values of the position vector, a selector that selects, during 
the matrix operation, a register corresponding to the predetermined 
sum-of-products operator and inputs the coordinate value recorded in 
this register and component values corresponding to the one row of the 
matrix to the predetermined sum-of-products operator, and selects, 

20 during the inner product operation, the plurality of registers one by one 
in a round-robin fashion and inputs component values of the normal 
vector and the position vector recorded in the selected register in the 
predetermined sum-of-products operator. 

The processing method according to the present invention is a 

25 processing method that allows a matrix operation and vector inner 
product operation to be selectively executed and is executed by an 
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apparatus provided with a plurality of operating means, comprising the 
steps of inputting, during the matrix operation, arithmetical elements 
subjected to the matrix operation by assigning the arithmetical elements 
to the plurality of operating means based on the features thereof to carry 
out a sum-of-products operation based on the assigned arithmetical 
elements and inputting, during the inner product operation, arithmetical 
elements subjected to the inner product operation in one predetermined 
operating means to allow the operating means to carry out a 
sum-of-products operation based on the arithmetical elements. 

The computer program according to the present invention is a 
computer program that makes it possible to selectively execute a matrix 
operation and vector inner product operation and renders a computer 
provided with a plurality of operating means to execute a step of inputting, 
during the matrix operation, arithmetical elements subjected to the 
matrix operation by assigning the arithmetical elements to the plurality 
of operating means based on the features thereof to carry out a 
sum-of-products operation based on the assigned arithmetical elements 
and inputting, during the inner product operation, arithmetical elements 
subjected to the inner product operation in one predetermined operating 
means to allow the operating means to carry out a sum-of-products 
operation based on the arithmetical elements. 

The semiconductor device according to the present invention is a 
semiconductor device that makes it possible to selectively execute a 
matrix operation and vector inner product operation and is built in an 
apparatus incorporating a computer provided with a plurality of 
operating means, rendering the apparatus to execute a step of inputting, 
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during the matrix operation, arithmetical elements subjected to the 
matrix operation by assigning the arithmetical elements to the plurality 
of operating means based on the features thereof to allow each operating 
means to carry out a sum-of-products operation based on the assigned 
arithmetical elements and inputting, during the inner product operation, 
arithmetical elements subjected to the inner product operation in one 
predetermined operating means to allow the operating means to carry out 
a sum-of-products operation based on the arithmetical elements. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These objects and other objects and advantages of the present 
invention will become more apparent upon reading of the following 
detailed description and the accompanying drawings in which: 

FIG. 1 is a block diagram of an entertainment apparatus; 

FIG. 2 is a block diagram of a parallel arithmetic apparatus; 

FIG. 3 is an internal block diagram of an FMAC; 

FIG. 4 is a flow chart showing a procedure for inner product 
operation processing; and 

FIG. 5 is a block diagram of a parallel arithmetic apparatus. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

An embodiment of the present invention will be specifically 
described with reference to the drawings accompanying herewith. 

FIG. 1 illustrates a configuration example of an entertainment 
apparatus including a parallel arithmetic apparatus according to the 
present invention. 

This entertainment apparatus 1 is provided with two buses, a 
main bus Bl and a sub bus B2, to which a plurality of semiconductor 
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devices each having a specific function is connected. These buses Bl 
and B2 are mutually connected or disconnected via a bus interface INT. 

The main bus Bl is connected with a main CPU 10 which is a 
main semiconductor device, a main memory 1 1 made up of a RAM, a 
5 main DMAC (Direct Memory Access Controller) 12, an MPEG (Moving 

Picture Experts Group) decoder (MDEC) 13 and a graphic processing unit 
(hereinafter referred to as "GPU") 14 having a built-in frame memory 15 
which serves as a drawing memory. The GPU 14 is connected with a 
u CRTC (CRT controller) 16 for generating a video output signal so that the 

y 10 data drawn in the frame memory 15 can be displayed on a display 
W apparatus (not shown) . 

=C The CPU 10 loads a start program from the ROM 23 on the sub 

III 

W bus B2 at the startup of the entertainment apparatus 1 via the bus 

interface INT, executes the start program and operates an operating 

S 15 system. The CPU 10 also controls the media drive 27, reads an 

application program or data from the medium 28 mounted in this media 

M drive 27 and stores this in the main memory 11. The CPU 10 further 

applies the above-described geometry processing to various data read 
from the medium 28, for example, three-dimensional object data 
20 (coordinate values of vertices (typical points) of a polygon, etc.) made up 
of a plurality of basic graphics (polygons) and generates a display list 
containing geometry-processed polygon definition information 
(specifications of shape of the polygon used, its drawing position, type, 
color or texture, etc. of components of the polygon). 
25 The parallel arithmetic apparatus 100 is included in this main 

CPU 10 and used when geometry processing, etc. is carried out. Details 
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of the parallel arithmetic apparatus 100 will be described later. 

The GPU 14 is a semiconductor device having the functions of 
storing drawing context (drawing data including polygon components), 
carrying out rendering processing (drawing processing) by reading 
drawing context according to the display list notified from the main CPU 
10 and drawing polygons in the frame memory 15. The frame memory 
15 can also be used as a texture memory. Thus, a pixel image in the 
frame memory can be pasted as texture to a polygon to be drawn. 

The main DMAC 12 is a semiconductor device that carries out 
DMA transfer control over the circuits connected to the main bus B 1 and 
also carries out DMA transfer control over the circuits connected to the 
sub bus B2 according to the condition of the bus interface INT. The 
MDEC 1 3 is a semiconductor device that operates in parallel with the 
CPU 10 and has the function of expanding data compressed in MPEG 
(Moving Picture Experts Group) or JPEG (Joint Photographic Experts 
Group) systems, etc. 

The sub bus B2 is connected to a sub CPU 20 made up of a 
microprocessor, etc., a sub memory 21 made up of a RAM, a sub DMAC 
22, a ROM 23 that records a control program such as an operating 
system, a sound processing semiconductor device (SPU: Sound 
Processing Unit) 24 that reads sound data stored in the sound memory 

25 and outputs as audio output, a communication control section (ATM) 

26 that transmits/ receives information to/ from an external apparatus 
via a network (not shown), a media drive 27 for setting a medium 28 such 
as CD-ROM and DVD-ROM and an input device 3 1 . 

The sub CPU 20 carries out various operations according to the 
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control program stored in the ROM 23. The sub DMAC 22 is a 
semiconductor device that carries out control such as a DMA transfer 
over the circuits connected to the sub bus B2 only when the bus interface 
INT separates the main bus B 1 from sub bus B2. The input device 31 is 
5 provided with a connection terminal 32 through which an input signal 
from an operating device 33 is input. 

The entertainment apparatus 1 in such a configuration can carry- 
out matrix operations and inner product operations carried out during 
geometry processing at high speed through the parallel arithmetic 
10 apparatus 100 included in the main CPU 10, which will be described 
below. 

The parallel arithmetic apparatus 100 executes at high speed a 
matrix operation between a transformation matrix and vertex coordinate 
values carried out when coordinates of polygon vertices are transformed 
1 5 and an inner product operation between a normal vector oriented in the 
normal direction of the surface and a position vector of a light source 
carried out when a display condition such as brightness of the surface of 
an object is determined. 
<Embodiment 1> 

20 FIG. 2 shows a configuration example of the parallel arithmetic 

apparatus 100 included in the main CPU 10. 

This parallel arithmetic apparatus 100a acquires coordinate 

values of polygon vertices and data (arithmetical elements) necessary for 

geometry processing such as a transformation matrix used for matrix 
25 operations from the main memory 1 1 via the main bus B 1 and carries out 

operations. 
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The parallel arithmetic apparatus 100a is constructed by 
including a control circuit 110, registers 120a to 120d, selectors 130a 
and 130b, FMACs 140a to 140d as arithmetic units and an internal 
storage device 150. The registers 120a to 120d and the internal storage 
5 device 150 are connected via the internal bus B. 

The registers 120a to 120d each form a pair with the FMACs 140a 
to 140d, that is, the registers are designed to have a one-to-one 
correspondence with the FMACs. To realize matrix operations using a 4 
x 4 transformation matrix and inner product operations of 
1 0 four-dimensional vectors, this embodiment uses four pairs of register and 
FMAC, but the number of pairs can be determined according to the 
processing content as appropriate. 

Selectors 130a and 130b are provided between the register 120a 
and FMAC 140a. 

1 5 This embodiment expresses arithmetical elements used for matrix 

operations and inner product operations using floating-point numbers, 
but it goes without saying that fixed-point numbers can also be used 
instead. When arithmetical elements are expressed with fixed-point 
numbers, sum-of-products operators for fixed-point numbers will be 

20 used instead of the FMACs 140a to 140d. 

The control circuit 110 controls the overall operation of the 
parallel arithmetic apparatus 100a. For example, the control circuit 110 
controls the recording of arithmetical elements in the registers 120a to 
120d and the operations of the selectors 130a and 130b. 

25 The registers 120a to 120d take in and record arithmetical 

elements assigned to the respective registers from among the arithmetical 
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elements such as component values of a transformation matrix used for 
operations such as matrix operations or inner product operations, 
coordinate values of coordinates to be transformed and vector component 
values from the internal storage device 150 under the control of the 
5 control circuit 110. 

When an inner product operation of four-dimensional vectors is 
carried out, the registers 120a to 120d take in and record component 
values assigned to the respective registers as arithmetical elements from 
among component values of two four-dimensional vectors. For example, 

10 of the two four-dimensional vectors (Ax, Ay, Az, Aw) and (Bx, By, Bz, Bw), 
the register 120a records components values Ax and Bx, the register 
120b records components values Ay and By, the register 120c records 
components values Az and Bz and the register 120d records components 
values Aw and Bw. 

15 When a matrix operation is carried out using a 4 * 4 

transformation matrix, the registers 120a to 120d take in and record, as 
arithmetical elements, the coordinate values of the four-dimensional 
coordinates to be transformed and component values of a row assigned to 
the respective registers of the transformation matrix. For example, the 

20 registers 120a to 120d record component values of the transformation 
matrix in addition to coordinate values of the four-dimensional 
coordinates; the register 120a records the component values of the 1st 
row of the transformation matrix, the register 120b records the 
component values of the 2nd row of the transformation matrix, the 

25 register 120c records the component values of the 3rd row of the 

transformation matrix and the register 120d records the component 



17 



values of the 4th row of the transformation matrix as their respective 
arithmetical elements. The registers 120a to 120d each record a pair of 
the 1 st column component value of each row of the transformation matrix 
and the 1 st component value of the four-dimensional coordinate to be 
5 transformed, a pair of the 2nd column component value and the 2nd 

component value, a pair of the 3rd column component value and the 3rd 
component value and a pair of the 4th column component value and the 
4th component value, and these values are read one pair at a time. 
Furthermore, the registers 120a to 120d record calculation 
1 0 results of the FMACs 140a to 140d each forming a pair with the registers 
120a to 120d. 

The selectors 130a and 130b select one of the registers 120a to 
120d, take in an arithmetical element to be recorded in the selected 
register and supply the arithmetical element to the FMAC 140a. When 

15 an inner product operation is carried out, the selectors 130a and 130b 
select one of the registers 120a to 120d in a round-robin fashion, take in 
an arithmetical element to be recorded in the selected register and supply 
the arithmetical element to the FMAC 140a. When a matrix operation is 
carried out, the selectors 130a and 130b always select the register 120a 

20 and take in the arithmetical element recorded in the register 120a and 
supply the arithmetical element to the FMAC 140a. 

The selectors 130a and 130b select a register indicated by the 
control circuit 110 based on the content of an operation carried out at 
that time and the situation of progress of the operation, etc. 

25 The FMACs 140a to 140d take in two arithmetical elements 

recorded in the registers 120a to 120d and multiply and add up the two 
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arithmetical elements. 

FIG. 3 is an internal block diagram of the FMAC 140a. Since the 
other FMACs 140b to 140d also have the same configuration, only the 
configuration of the FMAC 140a will be explained here and explanations 
5 of the other FMACs 140b to 140d will be omitted. 

In order to multiply and add up the arithmetical elements taken in, 
the FMAC 140a is provided with a floating-point number multiplier 
(FMUL: Floating MULtiply) 141 and a floating-point number adder 
(FADD: Floating ADDer) 142. The two arithmetical elements taken in 
1 0 are multiplied by the FMUL 141 first. The multiplication result is sent to 
the FADD 142. The FADD 142 adds up the multiplication results sent 
from the FMUL 141 one by one. 

For example, when aO to an and bO to bn are taken in one after 
another as arithmetical elements, the FMAC 140a obtains the following 
1 5 calculation result: 

aO- bO+al- bl+ a2- b2+ +a(n-l)- b(n-l)+an- bn 

The FMACs 140a to 140d output the calculation results to the 
registers that form their respective pairs. 

Using the selectors 130a and 130b, the FMACs 140a to 140d 
20 perform the following operations during an inner product operation and 
matrix operation. 

When an inner product operation is carried out, the FMAC 140a 
multiplies component values of the components of two vectors supplied 
from the registers 120a to 120d via the selectors 130a and 130b and adds 
25 up the multiplication results one by one. Furthermore, it is also 
possible to count the number of times these multiplications and 
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additions are performed, make the situation of progress of the inner 
product operation visible and prevent the next instruction from starting 
until the inner product operation is completed. 

When a matrix operation is carried out, the FMACs 140a to 140d 
5 multiply component values of the transformation matrix taken in from 
the corresponding registers 120a to 120d by coordinate values of the 
four-dimensional coordinates which form pairs and add up the 
multiplication results one by one. 

The internal storage device 150 takes in coordinate values of 
10 polygon vertices, component values of the transformation matrix used for 
matrix operations, data necessary for geometry processing of vector 
component values, etc. from the main memory 1 1 and records these 
values under the control of the control circuit 110. Furthermore, the 
internal storage device 150 takes in and records the calculation results 
15 from the registers 120a to 120d. The calculation results are sent to the 
main memory 1 1 via the internal storage device 150. 

A direct memory access transfer is performed between the internal 
storage device 150 and the main memory 11, which allows high speed 
data transmission/reception and is convenient for processing of images, 
20 etc. which requires large- volume data processing. 

The processing procedure when the parallel arithmetic apparatus 
100a carries out the inner product operation in mathematical expression 
2, that is, the inner product operation between vector A (Ax, Ay, Az, Aw) 
and vector B (Bx, By, Bz, Bw) will be explained. FIG. 4 is a flow chart 
25 showing such a processing procedure. 

The parallel arithmetic apparatus 100a takes in the component 
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values of the vector A (Ax, Ay, Az, Aw) and vector B (Bx, By, Bz, Bw) stored 
in the main memory 1 1 through a direct memory access transfer and 
records the component values in the internal storage device 150 (step 
S101). 

5 The registers 120a to 120d take in the component values assigned 

to the respective registers from among the component values of the vector 
A (Ax, Ay, Az, Aw) and vector B (Bx, By, Bz, Bw) stored in the internal 
storage device 150. That is, the register 120a takes in Ax and Bx, the 
register 120b takes in Ay and By, the register 120c takes in Az and Bz and 
. 10 the register 120d takes in Aw and Bw (step SI 02). 

The selectors 130a and 130b select one of the registers 120a to 
- r - 120d, take in the component values of vector A and vector B to be 

th recorded in the selected register and supply the component values to the 

T FMAC 140a. The control circuit 110 determines which of the registers 

U 15 120a to 120d should be selected according to the situation of progress of 
□J the inner product operation. The selectors 130a and 130b select one of 

the registers 120a to 120d under the control of the control circuit 1 10. 
Here, the selectors 130a and 130b select the register 120a, take in Ax and 
Bx and supply Ax and Bx to the FMAC 140a, first (step S103). The 
20 FMAC 140a performs a sum-of-products operation between Ax and Bx 
using the FMUL 141 and FADD 142 (step S104). Before the first 
sum-of-products operation is carried out, the internal state of the FMAC 
140a is cleared. 

After the sum-of-products operation, the FMAC 140a determines 
25 whether the inner product operation has been completed or not (step 

SI 05). Whether the inner product operation has been completed or not 
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can be determined by knowing the number of component values of 
vectors subjected to the inner product operation. The number of times a 
sum-of-products operation is performed is counted and it is when the 
count equals to the number of component values of vectors input that it 
5 is determined that the inner product operation has been completed. 

This makes it possible to know from the count the register from which the 
next component value should be extracted. The result of determination 
as to whether the inner product operation has been completed or not is 
sent to the control circuit 110. 

10 In this case, the inner product operation has not been completed 

yet (step S105: N), and therefore the control circuit 1 10 allows the 
selectors 130a andl30b to select the register 120b. The selectors 130a 
andl30b select the register 120b under the control of the control circuit 
1 10, take in Ay and By and supply Ay and By to the FMAC 140a. When 

15 the FMAC 140a takes in Ay and By, the FMUL 141 and FADD 142 

perform a sum-of-products operation to obtain Ax- Bx+Ay- By. Likewise, 
step S103 to step S105 are repeated until the inner product operations 
are completed to obtain Ax- Bx+Ay- By+Az- Bz+Aw- Bw. . 

Upon determining that the inner product operations have been 

20 completed (step S105: Y), the FMAC 140a outputs the calculation result 
to the register 120a (step S106). After the output, the FMAC 140a clears 
the internal state (step S107). The output calculation result is input 
from the register 120a to the internal storage device 150 and sent to the 
main memory 1 1 . 

25 This completes the inner product operations. 

Providing the selectors 130a and 130b allows calculations 
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between component values of different components making it easier to 
carry out inner product operations. The selectors 130a and 130b are 
provided between the register 120a and FMAC 140a, but this 
embodiment is not limited to this, and the selectors 130a and 130b can 
5 also be provided between the register 120b and FMAC 140b, between the 
register 120c and FMAC 140c or between the register 120d and FMAC 
140d. 

When a matrix operation is performed, the selectors 130a and 
130b always select the register 120a, only supply the arithmetical 

1 0 element recorded in the register 120a to the FMAC 140a and never supply 
the arithmetical elements recorded in the other registers 120b to 120d to 
the FMAC 140a. The arithmetical elements recorded in the other 
registers 120b to 120d are taken into the FMACs 140b to 140d with 
which the registers 120b to 120d form their respective pairs and 

15 processed. 

For example, when the matrix operation in mathematical 
expression 1 is carried out, the register 120a records the component 
values (Ml 1, M12, M 13, M14) of the 1st row of the transformation matrix 
and the coordinate values (Vx, Vy, Vz, Vw) of the four-dimensional 

20 coordinates. The register 120b records the component values (M21, 
M22, M23, M24) of the 2nd row of the transformation matrix and the 
coordinate values (Vx, Vy, Vz, Vw) of the four-dimensional coordinates. 
The register 120c records the component values (M3 1, M32, M33, M34) of 
the 3rd row of the transformation matrix and the coordinate values (Vx, 

25 Vy, Vz, Vw) of the four-dimensional coordinates. The register 120d 

records the component values (M41, M42, M43, M44) of the 4th row of 
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the transformation matrix and the coordinate values (Vx, Vy, Vz, Vw) of 
the four-dimensional coordinates. 

The FMACs 140a to 140d sequentially take in the component 
values and coordinate values recorded in the registers 120a to 120d with 
5 which the FMACs 140a to 140d form their respective pairs and carry out 
operations. Suppose the FMAC 140a is taken as an example. The 
FMAC 140a takes in Ml 1 and Vx from the register 120a via the selectors 
1 30a and 1 30b and calculates Mil- Vx using the FMUL 141. The 
FMACs 140a sends this to the FADD 142. Then, the FMACs 140a takes 

10 in M 12 and Vy and calculates M12- Vy, sends this to the FADD 142 and 
calculates Mil- Vx+M12- Vy. Then, FMACs 140a carries out the same 
calculation on M13 and Vz, and Ml 4 and Vw and calculates 
Mil- Vx+M12- Vy+M13- Vz+M14- Vw. The other FMACs 140b to 140d 
carry out the same operations. Thus, the FMACs 140a to 140d carry out 

15 operations in parallel executing thereby 4 * 4 matrix operations at the 
same speed as the conventional art. 

As described above, the parallel arithmetic apparatus 100a is an 
apparatus that selectively carries out a matrix operation and vector inner 
product operation. The parallel arithmetic apparatus 100a is provided 

20 with at least the registers 120a to 1 20d that record component values of a 
transformation matrix as arithmetical elements during the matrix 
operation and record vector component values as arithmetical elements 
during the inner product operation, the FMACs 140a to 140d that take in 
the arithmetical elements recorded in the registers 120a to 120d and 

25 carry out sum-of-products operations, selectors 130a and 130b that 
select one register from the registers 120a to 120d and supply the 
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arithmetical elements registered in the selected register to the FMAC 
140a. The registers 120b to 120d form a one-to-one correspondence 
with the FMACs 140b to 140d. The selectors 130a and 130b supply 
component values of the transformation matrix recorded in the register 
5 120a to the FMAC 140a during the matrix operation and select the 

registers 120a to 120d one by one in a round-robin fashion and supply 
the vector component value recorded in the selected register to the FMAC 
140a during the inner product operation. 

Providing the selectors 130a and 130b in this way makes it 
5 10 possible to carry out the matrix operation and inner product operation 

selectively. 
1=5 < Embodiment 2> 

- f = FIG. 5 is a block diagram of a parallel arithmetic apparatus 100b 

yi 

~ according to another embodiment. 

U 5 Compared to the parallel arithmetic apparatus 1 00a shown in FIG. 

M 2, the parallel arithmetic apparatus 100b is only different in that 

R temporary registers 160b to 160d are provided at the output ends of the 

registers 120b to 120d. 

This parallel arithmetic apparatus 100b is constructed of registers 
20 120a to 120d that record arithmetical elements, FMACs 140a to I40d 
that carry out sum-of-products operations based on the arithmetical 
elements recorded in these registers 120a to 120d, selectors 130a and 
130b inserted between the register 120a and FMAC 140a and temporary 
registers 160b to 160d inserted between the registers 120b to 120d and 
25 the FMAC 140b to 140d. The selectors 130a and 130b select one from 
among the register 120a and the temporary registers 160b to 160d and 
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inputs the arithmetical element recorded in the selected register 120a or 
temporary register 160b to 160d to the FMAC 140a. Operations of these 
components are controlled by the control circuit 110. 

The temporary registers 160b to 160d have a one-to-one 
5 correspondence with the registers 120b to 120d. The temporary 
registers 160b to 160d temporarily store the arithmetical elements 
recorded in their respective registers 120b to 120d when these are sent to 
the FMAC 140b to 140d or the selectors 130a and 130b. 

Since the temporary registers 160b to 160d temporarily record the 

10 arithmetical elements from the registers 120b to 120d, even if the 

arithmetical elements are not taken from the registers 120b to 120d into 
the FMAC 140a at the same timing as in the case of the inner product 
operation, the read ports of the registers 120b to 120d are not occupied 
by the arithmetical elements for inner product operations. Thus, while 

15 the FMAC 140a is carrying out a matrix operation, the other FMAC 140b 
to 140d take in the next arithmetical elements from the registers 120b to 
120d, allowing a sum-of-products operation. 

The above- described embodiments have described the 
entertainment apparatus using the parallel arithmetic apparatus 100 as 

20 an example, but the present invention is not limited to this and the 
parallel arithmetic apparatus of the present invention can use any 
information processor which carries out parallel arithmetic processing 
and carries out at least matrix operations and vector inner product 
operations. Moreover, the number of pairs of register and 

25 sum-of-product operator (FMAC) is not limited to 4, but that number of 
pairs can be determined according to the processing carried out by the 
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relevant apparatus. 

Furthermore, the parallel arithmetic apparatus 100 can also be 
implemented by rendering a computer to execute the computer program 
of the present invention. This embodiment forms functional blocks 
corresponding to the selectors 130a and 130b on the computer with a 
plurality of FMACs through a co-operation between the computer 
program recorded in a computer-accessible recording medium such as a 
disk device or semiconductor memory and a control program (OS, etc.) 
incorporated in the computer. 

As described above, the present invention can perform vector 
inner product operations easily while performing matrix operations as 
efficiently as the conventional art. 

Various embodiments and changes may be made thereunto 
without departing from the broad spirit and scope of the invention. The 
above-described embodiment intended to illustrate the present invention, 
not to limit the scope of the present invention. The scope of the present 
invention is shown by the attached claims rather than the embodiment. 
Various modifications made within the meaning of an equivalent of the 
claims of the invention and within the claims are to be regarded to be in 
the scope of the present invention. 



